A Prediction Divergence Criterion for Model Selection

نویسندگان

  • Stéphane Guerrier
  • Maria-Pia Victoria-Feser
چکیده

The problem of model selection is inevitable in an increasingly large number of applications involving partial theoretical knowledge and vast amounts of information, like in medicine, biology or economics. The associated techniques are intended to determine which variables are “important” to “explain” a phenomenon under investigation. The terms “important” and “explain” can have very different meanings according to the context and, in fact, model selection can be applied to any situation where one tries to balance variability with complexity. In this paper, we introduce a new class of error measures and of model selection criteria, to which many well know selection criteria belong. Moreover, this class enables us to derive a novel criterion, based on a divergence measure between the predictions produced by two nested models, called the Prediction Divergence Criterion (PDC). Our selection procedure is developed for linear regression models, but has the potential to be extended to other models. We demonstrate that, under some regularity conditions, it is asymptotically loss efficient and can also be consistent. In the linear case, the PDC is a counterpart to Mallow’s Cp but with a lower asymptotic probability of overfitting. In a case study and by means of simulations, the PDC is shown to be particularly well suited in “sparse” settings with correlated covariates which we believe to be common in real applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MATHEMATICAL ENGINEERING TECHNICAL REPORTS Bayesian prediction and model selection for locally asymptotically mixed normal models

The METR technical reports are published as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and c...

متن کامل

A Large - Sample Model Selection CriterionBased on Kullback ' s Symmetric

The Akaike information criterion, AIC, is a widely known and extensively used tool for statistical model selection. AIC serves as an asymptotically unbiased estimator of a variant of Kullback's directed divergence between the true model and a tted approximating model. The directed divergence is an asymmetric measure of separation between two statistical models, meaning that an alternate directe...

متن کامل

On Measures of Information and Divergence and Model Selection Criteria

1: In this paper we discuss measures of information and divergence and model selection criteria. Three classes of measures, Fisher-type, divergencetype and entropy-type measures, are discussed and their properties are presented. Information through censoring and truncation is presented and model selection criteria are investigated including the Akaike Information Criterion (AIC) and the Diverge...

متن کامل

Penalized Bregman Divergence Estimation via Coordinate Descent

Variable selection via penalized estimation is appealing for dimension reduction. For penalized linear regression, Efron, et al. (2004) introduced the LARS algorithm. Recently, the coordinate descent (CD) algorithm was developed by Friedman, et al. (2007) for penalized linear regression and penalized logistic regression and was shown to gain computational superiority. This paper explores...

متن کامل

Testing Ecological Theory Using the Information-theoretic Approach: Examples and Cautionary Results

Ecologists are increasingly applying model selection to their data analyses, primarily to compare regression models. Model selection can also be used to compare mechanistic models derived from ecological theory, thereby providing a formal framework for testing the theory. The Akaike Information Criterion (AIC) is the most commonly adopted criterion used to compare models; however, its performan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015